智能论文笔记

Identification of Fine-Grained Location Mentions in Crisis Tweets

Sarthak Khanal , Maria Traskowsky , Doina Caragea

分类：自然语言处理 | 机器学习

2021-11-11

识别危机推文中的细粒度位置提到是将从社交媒体提取的情境意识信息转换为可行信息的核心。大多数事先作业都集中在识别通用地点，而不考虑其特定类型。为了促进细粒度的位置识别任务的进步，我们组装了两个推文危机数据集，并用特定的位置类型手动注释它们。第一个数据集包含来自混合危机事件的推文，而第二个数据集包含来自全球Covid-19大流行的推文。我们在域内和交叉域设置中调查在这些数据集上的序列标记的最先进的深度学习模型的性能。

translated by 谷歌翻译

MixupE: Understanding and Improving Mixup from Directional Derivative Perspective

Vikas Verma , Sarthak Mittal , Wai Hoh Tang , Hieu Pham , Juho Kannala , Yoshua Bengio , Arno Solin , Kenji Kawaguchi

分类：机器学习 | 计算机视觉

2022-12-27

Mixup is a popular data augmentation technique for training deep neural networks where additional samples are generated by linearly interpolating pairs of inputs and their labels. This technique is known to improve the generalization performance in many learning paradigms and applications. In this work, we first analyze Mixup and show that it implicitly regularizes infinitely many directional derivatives of all orders. We then propose a new method to improve Mixup based on the novel insight. To demonstrate the effectiveness of the proposed method, we conduct experiments across various domains such as images, tabular data, speech, and graphs. Our results show that the proposed method improves Mixup across various datasets using a variety of architectures, for instance, exhibiting an improvement over Mixup by 0.8% in ImageNet top-1 accuracy.

translated by 谷歌翻译

Biomedical image analysis competitions: The state of current participation practice

Matthias Eisenmann , Annika Reinke , Vivienn Weru , Minu Dietlinde Tizabi , Fabian Isensee , Tim J. Adler , Patrick Godau , Veronika Cheplygina , Michal Kozubek , Sharib Ali

分类：计算机视觉 | 机器学习

2022-12-16

The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.

translated by 谷歌翻译

Is Bio-Inspired Learning Better than Backprop? Benchmarking Bio Learning vs. Backprop

Manas Gupta , Sarthak Ketanbhai Modi , Hang Zhang , Joon Hei Lee , Joo Hwee Lim

分类：机器学习

2022-12-09

Bio-inspired learning has been gaining popularity recently given that Backpropagation (BP) is not considered biologically plausible. Many algorithms have been proposed in the literature which are all more biologically plausible than BP. However, apart from overcoming the biological implausibility of BP, a strong motivation for using Bio-inspired algorithms remains lacking. In this study, we undertake a holistic comparison of BP vs. multiple Bio-inspired algorithms to answer the question of whether Bio-learning offers additional benefits over BP, rather than just biological plausibility. We test Bio-algorithms under different design choices such as access to only partial training data, resource constraints in terms of the number of training epochs, sparsification of the neural network parameters and addition of noise to input samples. Through these experiments, we notably find two key advantages of Bio-algorithms over BP. Firstly, Bio-algorithms perform much better than BP when the entire training dataset is not supplied. Four of the five Bio-algorithms tested outperform BP by upto 5% accuracy when only 20% of the training dataset is available. Secondly, even when the full dataset is available, Bio-algorithms learn much quicker and converge to a stable accuracy in far lesser training epochs than BP. Hebbian learning, specifically, is able to learn in just 5 epochs compared to around 100 epochs required by BP. These insights present practical reasons for utilising Bio-learning rather than just its biological plausibility and also point towards interesting new directions for future work on Bio-learning.

translated by 谷歌翻译

Estimation of Appearance and Occupancy Information in Birds Eye View from Surround Monocular Images

Sarthak Sharma , Unnikrishnan R. Nair , Udit Singh Parihar , Midhun Menon S , Srikanth Vidapanakal

分类：计算机视觉 | 机器人

2022-11-08

Autonomous driving requires efficient reasoning about the location and appearance of the different agents in the scene, which aids in downstream tasks such as object detection, object tracking, and path planning. The past few years have witnessed a surge in approaches that combine the different taskbased modules of the classic self-driving stack into an End-toEnd(E2E) trainable learning system. These approaches replace perception, prediction, and sensor fusion modules with a single contiguous module with shared latent space embedding, from which one extracts a human-interpretable representation of the scene. One of the most popular representations is the Birds-eye View (BEV), which expresses the location of different traffic participants in the ego vehicle frame from a top-down view. However, a BEV does not capture the chromatic appearance information of the participants. To overcome this limitation, we propose a novel representation that captures various traffic participants appearance and occupancy information from an array of monocular cameras covering 360 deg field of view (FOV). We use a learned image embedding of all camera images to generate a BEV of the scene at any instant that captures both appearance and occupancy of the scene, which can aid in downstream tasks such as object tracking and executing language-based commands. We test the efficacy of our approach on synthetic dataset generated from CARLA. The code, data set, and results can be found at https://rebrand.ly/APP OCC-results.

translated by 谷歌翻译

Comparative analysis of segmentation and generative models for fingerprint retrieval task

Megh Patel , Devarsh Patel , Sarthak Patel

分类：计算机视觉 | 机器学习

2022-09-13

像指纹一样的生物识别验证已成为用户身份验证和验证现代技术不可或缺的一部分。它在我们大多数人所意识到的更多方面普遍存在。但是，如果手指脏，湿，受伤或传感器故障时，这些指纹图像的质量会恶化。因此，通过去除噪声并将其重组以重组图像对于其身份验证至关重要，从而解除原始指纹。因此，本文提出了一种深入学习方法，以使用生成（GAN）和细分模型来解决这些问题。在Pix2Pixgan和Cyclean（生成模型）以及U-NET（分割模型）之间进行了定性和定量比较。为了训练该模型，我们创建了自己的数据集NFD-精心设计的嘈杂的指纹数据集，具有不同的背景以及某些图像中的划痕，以使其更现实和强大。在我们的研究中，U-NET模型的性能比GAN网络更好

translated by 谷歌翻译

Application of image-to-image translation in improving pedestrian detection

Devarsh Patel , Sarthak Patel , Megh Patel

分类：计算机视觉 | 人工智能 | 机器学习

2022-09-08

缺乏有效的目标区域使得在低强度光（包括行人识别和图像到图像翻译）中执行多个视觉功能变得困难。在这种情况下，通过使用红外和可见图像的联合使用来积累高质量的信息，即使在弱光下也可以检测行人。在这项研究中，我们将在LLVIP数据集上使用先进的深度学习模型，例如Pix2Pixgan和Yolov7，其中包含可见的信号图像对，用于低光视觉。该数据集包含33672张图像，大多数图像都是在黑暗场景中捕获的，与时间和位置紧密同步。

translated by 谷歌翻译

EvolvingBehavior: Towards Co-Creative Evolution of Behavior Trees for Game NPCs

Nathan Partlan , Luis Soto , Jim Howe , Sarthak Shrivastava , Magy Seif El-Nasr , Stacy Marsella

分类：神经与进化计算 | 人工智能

2022-09-01

为了协助游戏开发人员制作游戏NPC，我们展示了EvolvingBehavior，这是一种新颖的工具，用于基因编程，以在不真实的引擎4中发展行为树4.在初步评估中，我们将演变的行为与我们的研究人员设计的手工制作的树木和随机的树木进行了比较 - 在3D生存游戏中种植的树木。我们发现，在这种情况下，EvolvingBehavior能够产生行为，以实现设计师的目标。最后，我们讨论了共同创造游戏AI设计工具的探索的含义和未来途径，以及行为树进化的挑战和困难。

translated by 谷歌翻译

Interpretable Multimodal Emotion Recognition using Hybrid Fusion of Speech and Image Data

Puneet Kumar , Sarthak Malik , Balasubramanian Raman

分类：计算机视觉

2022-08-25

本文提出了一个基于混合融合的多模式情感识别系统，该系统将语音话语和相应图像描绘的情绪分类为离散类。已经开发了一种新的可解释性技术，以确定重要的语音和图像特征，从而预测特定的情感类别。拟议的系统的体系结构是通过大量消融研究确定的。它融合了语音和图像特征，然后结合了语音，图像和中间融合输出。提出的可解释性技术结合了划分和征服方法，以计算表示每个语音和图像特征的重要性的刻薄值。我们还构建了一个大规模数据集（IIT-R较小的数据集），包括语音话语，相应的图像和班级标签，即“愤怒”，“快乐”，“仇恨”和“悲伤”。拟议的系统已达到83.29％的情绪识别精度。提出的系统的增强性能提倡利用多种模式中的互补信息来识别情绪的重要性。

translated by 谷歌翻译

HTML版本

Hybrid Fusion Based Interpretable Multimodal Emotion Recognition with Insufficient Labelled Data

Puneet Kumar , Sarthak Malik , Balasubramanian Raman

分类：计算机视觉

2022-08-24

本文提出了一个多模式的情感识别系统，即视觉口语文本添加剂网（Vista Net），以将包含图像，语音和文本的多模式输入反映的情绪分类为离散类。还开发了一种新的可解释性技术，即K平均添加剂解释（KAAP），以确定重要的视觉，口语和文本特征，从而预测特定的情感类别。 Vista Net使用早期和晚期融合的混合体从图像，语音和文本方式融合信息。它会自动调整其中间输出的权重，同时在不干预的情况下计算加权平均值。 KAAP技术计算每种方式和相应特征在预测特定情绪类别的贡献。为了减轻带有离散情绪类别标记的多模式情感数据集的不足，我们构建了一个大规模的IIT-R MMEMOREC数据集，该数据集由现实生活中的图像，相应的语音和文本和情感标签（“愤怒，'快乐，''happy，''快乐，'' “恨，”和“悲伤”。）。 Vista Net在考虑图像，语音和文本方式上导致了95.99％的情绪识别精度，这比考虑任何一种或两种方式的输入的表现要好。

translated by 谷歌翻译

HTML版本